Search CORE

300 research outputs found

Archiving the Relaxed Consistency Web

Author: Jordan Ramiro
Liu Jinyang
Van de Sompel Herbert
van Reenen Johann
Xie Zhiwu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201

arXiv.org e-Print Archive

Crossref

RobustMQ: Benchmarking Robustness of Quantized Models

Author: Guo Jinyang
Liu Aishan
Liu Xianglong
Qin Haotong
Xiao Yisong
Zhang Tianyuan
Publication venue
Publication date: 04/08/2023
Field of study

Quantization has emerged as an essential technique for deploying deep neural networks (DNNs) on devices with limited resources. However, quantized models exhibit vulnerabilities when exposed to various noises in real-world applications. Despite the importance of evaluating the impact of quantization on robustness, existing research on this topic is limited and often disregards established principles of robustness evaluation, resulting in incomplete and inconclusive findings. To address this gap, we thoroughly evaluated the robustness of quantized models against various noises (adversarial attacks, natural corruptions, and systematic noises) on ImageNet. The comprehensive evaluation results empirically provide valuable insights into the robustness of quantized models in various scenarios, for example: (1) quantized models exhibit higher adversarial robustness than their floating-point counterparts, but are more vulnerable to natural corruptions and systematic noises; (2) in general, increasing the quantization bit-width results in a decrease in adversarial robustness, an increase in natural robustness, and an increase in systematic robustness; (3) among corruption methods, \textit{impulse noise} and \textit{glass blur} are the most harmful to quantized models, while \textit{brightness} has the least impact; (4) among systematic noises, the \textit{nearest neighbor interpolation} has the highest impact, while bilinear interpolation, cubic interpolation, and area interpolation are the three least harmful. Our research contributes to advancing the robust quantization of models and their deployment in real-world scenarios.Comment: 15 pages, 7 figure

arXiv.org e-Print Archive

Oil Saturation Boundary for Partial Oil and Partial Water Recognition in the Oil-Water Transition Zone

Author: Bing WU
Jinyang WU
Jiyu LIU
Siming GAI
Yansong WU
Publication venue: Canadian Research & Development Center of Sciences and Cultures
Publication date: 26/09/2013
Field of study

With the development of oilfield, the oil reserves in oil-water transition zone has become a significant part of comprehensive reserves gradually. Especially the partial oil layer of oil-water transition zone has potential exploitation. But how to identify partial oil layer has become a difficulty in the development planning of the oil-water transition zone. Over the years, there has been little research on the oil-water transition. The oil saturation boundaries for partial oil and partial water recognition are mainly studied in this paper. Two major approaches, theoretical calculation methods and cumulative probability curve have been applied to the study. That will provide the basis for further perforation development and dynamic adjustment.Key words: The oil-water transition zone; Partial oil layer; Oil saturation; Theoretical calculation methods; Cumulative probability curv

CSCanada.net: E-Journals (Canadian Academy of Oriental and Occidental Culture, Canadian Research & Development Center of Sciences and Cultures)

Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets

Author: Cappello Franck
Chen Zizhong
Di Sheng
Liang Xin
Liu Jinyang
Zhao Kai
Publication venue
Publication date: 21/10/2023
Field of study

With the ever-increasing execution scale of high performance computing (HPC) applications, vast amounts of data are being produced by scientific research every day. Error-bounded lossy compression has been considered a very promising solution to address the big-data issue for scientific applications because it can significantly reduce the data volume with low time cost meanwhile allowing users to control the compression errors with a specified error bound. The existing error-bounded lossy compressors, however, are all developed based on inflexible designs or compression pipelines, which cannot adapt to diverse compression quality requirements/metrics favored by different application users. In this paper, we propose a novel dynamic quality metric oriented error-bounded lossy compression framework, namely QoZ. The detailed contribution is three-fold. (1) We design a novel highly-parameterized multi-level interpolation-based data predictor, which can significantly improve the overall compression quality with the same compressed size. (2) We design the error-bounded lossy compression framework QoZ based on the adaptive predictor, which can auto-tune the critical parameters and optimize the compression result according to user-specified quality metrics during online compression. (3) We evaluate QoZ carefully by comparing its compression quality with multiple state-of-the-arts on various real-world scientific application datasets. Experiments show that, compared with the second-best lossy compressor, QoZ can achieve up to 70% compression ratio improvement under the same error bound, up to 150% compression ratio improvement under the same PSNR, or up to 270% compression ratio improvement under the same SSIM

arXiv.org e-Print Archive

EvLog: Evolving Log Analyzer for Anomalous Logs Identification

Author: Huo Yintong
Lee Cheryl
Liu Jinyang
Lyu Michael
Shan Shiwen
Su Yuxin
Publication venue
Publication date: 02/06/2023
Field of study

Software logs record system activities, aiding maintainers in identifying the underlying causes for failures and enabling prompt mitigation actions. However, maintainers need to inspect a large volume of daily logs to identify the anomalous logs that reveal failure details for further diagnosis. Thus, how to automatically distinguish these anomalous logs from normal logs becomes a critical problem. Existing approaches alleviate the burden on software maintainers, but they are built upon an improper yet critical assumption: logging statements in the software remain unchanged. While software keeps evolving, our empirical study finds that evolving software brings three challenges: log parsing errors, evolving log events, and unstable log sequences. In this paper, we propose a novel unsupervised approach named Evolving Log analyzer (EvLog) to mitigate these challenges. We first build a multi-level representation extractor to process logs without parsing to prevent errors from the parser. The multi-level representations preserve the essential semantics of logs while leaving out insignificant changes in evolving events. EvLog then implements an anomaly discriminator with an attention mechanism to identify the anomalous logs and avoid the issue brought by the unstable sequence. EvLog has shown effectiveness in two real-world system evolution log datasets with an average F1 score of 0.955 and 0.847 in the intra-version setting and inter-version setting, respectively, which outperforms other state-of-the-art approaches by a wide margin. To our best knowledge, this is the first study on tackling anomalous logs over software evolution. We believe our work sheds new light on the impact of software evolution with the corresponding solutions for the log analysis community

arXiv.org e-Print Archive

FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space

Author: Abdelzaher Tarek
Diggavi Suhas
Kimura Tomoyoshi
Li Jinyang
Liu Dongxin
Liu Shengzhong
Srivastava Mani
Wang Ruijie
Publication venue
Publication date: 30/10/2023
Field of study

This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective. In contrast, the private space extracts modality-exclusive information through a transformation-invariant objective. Second, we propose a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. Extensive evaluations are performed on four multimodal sensing datasets with two backbone encoders and two classifiers to demonstrate the superiority of FOCAL. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.Comment: Code available at: [github](https://github.com/tomoyoshki/focal

arXiv.org e-Print Archive